The Effect of Bucket Size Tuning in the Dynamic Hybrid GRACE Hash Join Method
نویسندگان
چکیده
In this paper, we show detailed analysis and performance evaluation of the Dynamic Hybrid GRACE Hash Join Method (DHGH Method) when the tuple distribution in buckets is unbalanced. The conventional Hash Join Methods specify the tuple distribution in buckets statically. However it may differ from estimation since join operations are applied with selection operations. When the tuple distribution in buckets is unbalanced, the processing cost of join operation becomes more costly than the ideal case when you use Hybrid Hash Join Method (HH Method). On the other hand, when you use the DHGH Method, the destaging buckets are selected dynamically, gives the same performance as the ideal case even if the tuple distribution in buckets is unbalanced such as Zipf-like distributions. We analyze the total I/O cost of a join operation at various number of buckets. The result shows that we have to determine the number of buckets baaed on the tuple distribution in buckets rather than the size of the source relation. It is shown that we had better partition the source relation using a large number of small buckets instead of the smaller number of buckets almost filling the whole main memory adopted in the HH Method.
منابع مشابه
Hash-Partitioned Join Method Using Dynamic Destaging Strategy
1 Introduction In this paper we propose a new hash-partitioned join method using a dynamic de&aging strategy for large scale databases. The traditional hash-partitioned join methods such as the Hybrid Hash Join Method assume that the size of each bucket can be controlled by selecting a split function, and the characteristics of the buckets are statically specified. For materializing this assump...
متن کاملOn a Three-Way Hash Join Algorithm
We develop hash-based algorithms for computing a three-way join. The method involves hashing all three relations into buckets, and then joining buckets in main memory, three buckets at a time. Comparing to two-cascaded hash joins, the algorithms avoid materializing an intermediate result. We present a cost model for this approach, from which we identify the range of parameters for queries that ...
متن کاملAdapting Hash Joins For Modern Processors
Hash join algorithms are crucial to the performance of modern database systems. Conventional hash joins exhibit poor memory system performance on modern processors because their key data structure, the bucket-chain hash table, is ill-suited for the performance characteristics of out-of-order processors with large cache hierarchies. Whereas prior research has considered a variety of optimization...
متن کاملBucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)
The Super Database Computer (SDC) is a highperformance relational database server for a joinintensive environment under development at University of Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other join algorithms, a hash-based algorithm is quite efficient and easily parallelieed, and has been employed by many database machines. However, in the presence of da...
متن کاملTowards Eliminating Random 1 / 0 in Hash Joins
The widening performance gap between CPU and disk is significant for hash join performance. Most current hash join methods try t o reduce the volume of data transferred between memory and disk. In this paper, we try to reduce hash-join times b y reducing random I/O. We study how current algorithms incur random I/O, and propose a new hash join method, Seq+, that converts much of the random 1/0 t...
متن کامل